Computer Based Training
Knowledge Starts with Practice: Knowledge-Aware Exercise Generative Recommendation with Adaptive Multi-Agent Cooperation
Adaptive learning, which requires the in-depth understanding of students' learning processes and rational planning of learning resources, plays a crucial role in intelligent education. However, how to effectively model these two processes and seamlessly integrate them poses significant implementation challenges for adaptive learning. As core learning resources, exercises have the potential to diagnose students' knowledge states during the learning processes and provide personalized learning recommendations to strengthen students' knowledge, thereby serving as a bridge to boost student-oriented adaptive learning. Therefore, we introduce a novel task called Knowledge-aware Exercise Generative Recommendation (KEGR). It aims to dynamically infer students' knowledge states from their past exercise responses and customizably generate new exercises. To achieve KEGR, we propose an adaptive multi-agent cooperation framework, called ExeGen, inspired by the excellent reasoning and generative capabilities of LLM-based AI agents. Specifically, ExeGen coordinates four specialized agents for supervision, knowledge state perception, exercise generation, and quality refinement through an adaptive loop workflow pipeline. More importantly, we devise two enhancement mechanisms in ExeGen: 1) A human-simulated knowledge perception mechanism mimics students' cognitive processes and generates interpretable knowledge state descriptions via demonstration-based In-Context Learning (ICL). In this mechanism, a dualmatching strategy is further designed to retrieve highly relevant demonstrations for reliable ICL reasoning.
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose AdaLRS, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide theoretical and experimental analyzes to show that foundation model pretraining loss and its descent velocity are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via theoretical analysis. Experiments on both LLM and VLM pretraining show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness, with model performance improved accordingly. We also show the robust generalizability of AdaLRS across varying training scenarios, such as different model sizes, training paradigms, base learning rate scheduler choices, and hyperparameter settings.
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understanding, and fine-grained manipulation capabilities. To address these limitations, we introduce OSWORLD-G, a comprehensive benchmark comprising 564 finely annotated samples across diverse task types including text matching, element recognition, layout understanding, and precise manipulation. Additionally, we synthesize and release the largest computer use grounding dataset JEDI, which contains 4 million examples through multi-perspective decoupling of tasks. Our multi-scale models trained on JEDI demonstrate its effectiveness by outperforming existing approaches on ScreenSpot-v2, ScreenSpot-Pro, and our OSWORLD-G. Furthermore, we demonstrate that improved grounding with JEDI directly enhances agentic capabilities of general foundation models on complex computer tasks with state-of-the-art performance, improving from 23% to 51% on OSWorld. Through detailed ablation studies, we identify key factors contributing to grounding performance and verify that combining specialized data for different interface elements enables compositional generalization to novel interfaces.
Personalized Exercise Recommendation with Semantically-Grounded Knowledge Tracing
We introduce ExRec, a general framework for personalized exercise recommendation with semantically-grounded knowledge tracing. Our method builds on the observation that existing exercise recommendation approaches simulate student performance via knowledge tracing (KT) but they often overlook two key aspects: (a) the semantic content of questions and (b) the sequential, structured progression of student learning. To address this, our ExRec presents an end-to-end pipeline, from annotating the KCs of questions and learning their semantic representations to training KT models and optimizing several reinforcement learning (RL) methods. Moreover, we improve standard Q-learning-based continuous RL methods via a tailored model-based value estimation (MVE) approach that directly leverages the components of KT model in estimating cumulative knowledge improvement.
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components. Yet, understanding how expertise emerges through joint training of gating mechanisms and experts remains incomplete, especially in scenarios without clear task partitions. Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can dynamically allocate domain-specific expertise across multiple underlying data distributions. As an outcome of our framework, we develop an instance tailored specifically to decentralized training scenarios, introducing Dynamically Decentralized Orchestration of MoEs or DDOME. DDOME leverages heterogeneity emerging from distributional shifts across decentralized data sources to specialize experts dynamically. By integrating a pretrained common expert to inform a gating function, DDOMEachieves personalized expert subset selection on-the-fly, facilitating just-in-time personalization.
Accelerating Reinforcement Learning Training Using Simulation Surrogate Models
Ghasemloo, Mohammadmahdi, Eckman, David J., Li, Yaxian
High-fidelity simulation models are widely used to analyze complex stochastic systems, but their high computational cost motivates the development of cheaper surrogate models that approximate the simulation model's input-output relationship. In parallel, reinforcement learning (RL) has emerged as a powerful framework for making online decisions in stochastic environments, with increasing attention being given to the use of simulation models as training environments for RL models. We investigate a class of surrogate models suitable for accelerating RL training in settings where the reward structure, model parameters, or system dynamics change over time and explore their interactions with simulation models and RL models. Through numerical experiments on a stochastic service system modeled via discrete-event simulation, we demonstrate that leveraging surrogate models can substantially accelerate RL training and re-training.
An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits
We audit alignment-induced shifts in residual-stream activations of three open-weight instruction-tuned LLMs (Llama-3.1-8B-Instruct, Gemma-2-9B-it, Qwen-2.5-7B-Instruct) using the effective rank of the alignment modification matrix on safety-relevant inputs, rho_eps := rank_eps(M_Ds)/d, which formalizes the single-refusal-direction observation of Arditi et al. (2024) as a continuous quantity. The paper has three contributions. (1) Confound-controlled measurement: a four-variant decomposition (M_naive, M_template, M_aligned, M_DiD) separates chat-template formatting, alignment-stage shift, and the refusal-mediating direction, and recovers the Arditi refusal direction on M_DiD at |cos| in {0.77, 0.86, 0.50} (Llama/Gemma/Qwen); chat-template-controlled rho_eps is {0.0029, 0.0048, 0.0044}, and the centered SVD residual is 4-7x larger. (2) Constructive calibration on a 3-layer MLP across rho_eps in {0.008, 0.17, 0.33, 0.40} exhibits a sweet-spot vs. brittle distinction: mild rank-maximization (lambda=5) buys ablation robustness, while strong regularization at the same nominal rho_eps (lambda=50) does not. rho_eps is a diagnostic for fragility, not a target whose mechanical inflation buys robustness. (3) Limits of rank-based diagnostics: (a) not safety-specific (LRH baseline is 2-3x the safety value); (b) SVD principal ordering does not match causal ordering (Llama u_2 inert despite ranking second; cumulative ablation non-monotone at k=5); (c) the spectral-gap hypothesis required to upgrade the O(rho_eps * d) achievability bound to a matching Mirsky-route lower bound fails empirically (1/90 Llama layer-reference pairs, 0/36 MLP combinations) and structurally (kappa_lb <= 2/(eps * r)). The matching lower bound remains an open problem.
EDU Unlimited turns online learning into a one-time 20 purchase instead of ongoing tuition costs
When you purchase through links in our articles, we may earn a small commission. TL;DR: Score lifetime access to EDU Unlimited for just $19.97 through May 31 (MSRP $600) and unlock 1,000+ online courses across tech, business, creative skills, and more with a single payment. Online learning can get expensive fast, especially when a single course or boot camp can run into the hundreds or even thousands of dollars. EDU Unlimited by StackSkills flips that model by giving you one-time lifetime access to a massive library of 1,000+ courses across a wide range of subjects for just $19.97 during this limited-time offer (MSRP $600). From coding and marketing to creative hobbies like photography or design, StackSkills lets you build your dream skill-set at your own pace, without the pressure.
MasterClass is 50% off today. It's worth it just for the entertainment
When you purchase through links in our articles, we may earn a small commission. MasterClass is 50% off today. Until May 10th, MasterClass annual plans start at $60/year. It's great for casual learners who want high-quality, entertaining courses from big names. With the job market being what it is, there's never been a better time to learn new skills (or brush up on old ones).